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Optimal  Instrumental  Variables  Estimation  for  ARMA 

Models 

By  Guido  M.  Kuersteiner1 

In  this  paper  a  new  class  of  Instrumental  Variables  estimators  for  linear 
processes  and  in  particular  ARMA  models  is  developed.  Previously,  IV  esti- 
mators based  on  lagged  observations  as  instruments  have  been  used  to  account 
for  unmodelled  MA(q)  errors  in  the  estimation  of  the  AR  parameters.  Here  it 
is  shown  that  these  IV  methods  can  be  used  to  improve  efficiency  of  linear  time 
series  estimators  in  the  presence  of  unmodelled  conditional  heteroskedasticity. 
Moreover  an  IV  estimator  for  both  the  AR  and  MA  parts  is  developed.  One 
consequence  of  these  results  is  that  Gaussian  estimators  for  linear  time  series 
models  are  inefficient  members  of  this  IV  class.  A  leading  example  of  an  inef- 
ficient member  is  the  OLS  estimator  for  AR(p)  models  which  is  known  to  be 
efficient  under  homoskedasticity. 

Keywords:  ARMA,  conditional  heteroskedasticity,  instrumental  variables,  efficiency  lower- 
bound,  frequency  domain. 
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1.  Introduction2 

This  paper  considers  instrumental  variables  (IV)  estimators  for  linear  time  series  models. 
Efficient  estimation  in  this  framework  has  been  studied  by  Hayashi  and  Sims  (1983),  Stoica, 
Soderstrom  and  Friedlander  (1985)  and  Hansen  and  Singleton  (1991,  1996).  In  these  papers 
efficient  estimation  of  autoregressive  roots  under  the  presence  of  moving  average  errors  has 
been  analyzed.  The  moving  average  part  of  the  model  is  not  estimated  but  rather  treated 
as  a  nuisance  parameter.  The  class  of  instruments  is  restricted  to  linear  functions  of  past 
observations.  It  is  also  assumed  in  this  literature  that  the  innovations  are  conditionally 
homoskedastic. 

Here  it  is  shown  that  the  same  class  of  IV  estimators  based  en  linear  functions  of  past 
observations  can  be  used  to  improve  efficiency  of  estimators  for  linear  time  series  models  in 
the  presence  of  unmodelled  conditional  heteroskedasticity.  A  consequence  of  the  results  of 
this  paper  is  that  standard  estimators  of  linear  process  models  based  on  Gaussian  Pseudo 
Likelihood  functions  are  inefficient  GMM  estimators  if  the  innovations  are  conditionally 
heteroskedastic.  This  means  in  particular  that  OLS  estimators  for  AR(p)  models  are 
inefficient  GMM  estimators  if  the  innovations  are  heteroskedastic. 

In  addition  the  paper  extends  the  current  literature  in  two  directions.  First,  an  IV 
estimator  for  general  linear  models,  including  MA(q)  parts  of  ARMA  models,  is  introduced 
under  the  assumption  of  conditionally  heteroskedastic  innovations.  Second,  for  the  class  of 
IV  estimators  with  linear  instruments  the  paper  derives  exact  functional  forms  of  optimal 
filters  of  the  type  developed  in  Hansen  and  Singleton  (1991)  for  a  simpler  estimation 
problem.  It  is  shown  how  the  filters  depend  on  fourth  order  cumulants  of  the  innovation 
distribution  and  the  impulse  response  function  of  the  underlying  process.  This  formulation 
allows  to  give  exact  conditions  on  the  distribution  of  the  error  process  under  which  optimal 
instrumental  variables  estimators  are  feasible.  A  detailed  analysis  of  the  properties  of  the 
optimal  weight  matrix  is  provided. 

The  results  in  this  paper  are  presented  for  the  case  of  martingale  difference  innova- 
tions driving  the  linear  process.  Alternatively  similar  formulas  with  the  same  efficiency 
implications  could  be  obtained  under  the  weaker  assumption  of  white  noise  innovations. 
In  this  case  the  space  of  permissible  instruments  is  generated  by  all  linear  combinations  of 
past  observations  and  the  efficiency  bounds  developed  here  are  identical  to  the  bounds  of 
Hansen  (1985)  and  Hansen,  Heaton  and  Ogaki  (1988).  In  the  case  of  martingale  difference 
innovations  Hansen's  bounds  are  based  on  a  larger  class  of  instruments  and  are  therefore 
tighter  than  the  bounds  obtained  here. 

A  detailed  analysis  of  the  linear  class  of  instruments  is  justified  by  the  fact  that  the 
Gaussian  estimators  are  a  member  of  this  class.  Any  IV  procedure  dominating  the  Gaussian 
estimators  therefore  has  to  contain  these  linear  instruments  in  the  set  of  all  instruments 


2This  paper  is  partly  based  on  results  in  my  Ph.D.  dissertation  at  Yale  University,  1997.  I  wish  to 
thank  Peter  C.B.  Phillips  for  continued  encouragement  and  support.  I  have  also  benefited  from  comments 
by  Donald  Andrews,  Jinyong  Hahn,  Jerry  Hausman,  Whitney  Newey,  Oliver  Linton,  Chris  Sims  and 
participants  of  the  econometrics  workshop  at  Yale  and  the  University  of  Pennsylvania.  I  benefited  from 
helpful  comments  of  three  anonymous  referees  on  an  earlier  paper  that  inspired  the  current  extensions.  All 
remaining  errors  are  my  own.  Financial  support  from  an  Alfred  P.  Sloan  Doctoral  Dissertation  Fellowship 
is  gratefully  acknowledged. 


used. 

The  main  technical  difficulty  in  extending  previous  procedures  to  the  estimation  of 
the  moving  average  case  lies  in  the  consistency  proof.  We  give  a  general  characterization 
of  instrument  processes  that  lead  to  consistent  estimators.  We  then  establish  that  the 
optimal  instrument  satisfies  these  criteria. 

In  this  paper  we  do  not  focus  on  implementation  issues.  For  most  parts  of  the  analysis 
it  is  assumed  that  the  optimal  instrument  is  known  a  priori.  It  is  clear  that  in  practice 
a  procedure  for  estimation  of  the  weight  matrix  is  needed.  In  Kuersteiner  (1997)  such  a 
feasible  procedure  is  developed  under  stronger  assumptions  about  the  joint  distribution 
of  the  error  process.  If  these  assumptions  are  satisfied  then  the  procedures  developed 
in  Kuersteiner  (1997)  can  be  directly  applied  to  the  present  context.  Explicit  formulas 
are  provided  for  this  case.  We  also  give  an  exact  formula  for  a  feasible  version  of  the 
optimal  procedure  under  the  more  general  conditions  analyzed  in  this  paper.  In  this  case 
the  feasible  estimator  depends  on  a  bandwidth  parameter.  A  maximal  rate  of  expansion 
of  this  parameter  for  the  estimator  to  maintain  its  first  order  asymptotic  properties  is 
provided.  However,  optimal  bandwidth  selection  procedures  are  beyond  the  scope  of  the 
paper. 

The  paper  is  organized  as  follows.  Section  2  introduces  the  assumptions  about  the 
innovation  sequence  and  specifies  the  inference  problem.  Section  3  develops  an  instru- 
mental variables  estimator  for  estimation  of  linear  process  models  and  proves  consistency 
and  asymptotic  normality  of  estimators  for  the  ARMA  class.  In  Section  4  it  is  shown 
how  to  factorize  the  asymptotic  covariance  matrix  of  this  class  of  instrumental  variables 
estimators  in  a  way  to  obtain  a  lower  bound.  Section  5  uses  the  lowerbound  to  obtain  an 
explicit  formulation  of  the  optimal  IV  estimator  depending  on  the  data  periodogram  and 
an  optimal  frequency  domain  filter.  Proofs  of  some  important  lemmas  are  contained  in 
Appendix  A  while  the  proofs  of  the  results  in  the  paper  are  contained  in  Appendix  B. 

2.  Model  Specification 

The  econometrician  observes  a  finite  stretch  of  data  {yt}"=1  which  is  generated  by  the 
following  mechanism 

oo 

yt  =  J2c(p,j)et-J  (2.1) 

3=0 

for  a  given  (3  =  (30  <E  Rd  and  c(/3,j)  :  Rd  x  N  — >  R.  The  parameter  /?0  is  unknown  but  the 
functions  c(.,j)  are  known.  We  define  the  lag  polynomial  C(0,z)  =  Yl'jLo  C(P d) z^  and 
impose  the  identifying  restriction  c(/3, 0)  =  1. 

The  innovations  e<  are  assumed  to  be  a  martingale  difference  sequence.  The  martingale 
difference  property  imposes  restrictions  on  the  fourth  order  cumulants.  These  restrictions 
can  be  conveniently  summarized  by  defining  the  following  function 


It  should  be  emphasized  that  a  (s,r)  is  equal  to  the  fourth  order  cumulant  for  s,r  >  0.  Let 

«..-  =  (  "(a'r)    ,      u    4    ^^  (2.3) 

[  Qr7.  =  a  (r,  r)  +  a 4     if  s  =  r  v      ; 

We  assume  that  we  have  a  probability  space  (Cl,T,P)  with  a  filtration  Tt  of  increasing 
(j- fields  such  that  Tt  Q  T+i  C  T  Vi.  The  doubly  infinite  sequence  of  random  variables 
{eJ}^_00  generates  the  filtration  Tt  such  that  jFt  =  a(et,et-i,  ...)•  The  assumptions  on 
{e4}^._00  are  summarized  as  follows: 

Assumption  Al.  ('ij  e^  is  strictly  stationary  and  ergodic,  (ii)  E  (et  \  Tt-\)  =  0  almost 
surely,  (Hi)  E  (e2  \  Tt-\)  =  o\  almost  surely  where  a2  is  not  constant,  (iv)  E  (ef)  —  a2  < 
°°>  M  Y1T=  l  E~  l  \a  (s,r)\  =  B  <  co,  (vi)  E  (s2e2_s)  >a  some  a>  0  for  all  s. 

Remark  1.  Assumption  Al(ii)  could  be  relaxed  to  Eet£s  =  0  for  t  ^  s  at  the  cost  of 
slightly  more  complicated  expressions  for  the  optimal  instruments.  Assumption  Al(iii) 
states  that  the  second  moments  are  conditionally  heterogeneous.  A  consequence  is  that 
terms  of  the  form  E  (s2et-s£t-r)  are  nonzero  for  s^r^O  and  depend  on  s  for  s  —  r  ^  0. 
Assumption  (v)  limits  the  dependence  in  higher  moments  by  imposing  a  summability 
condition  on  the  fourth  cumulants.  The  assumption  is  needed  to  prove  invertibility  of  the 
infinite  dimensional  weight  matrix  of  the  optimal  GMM  estimator.  Assumption  (vi)  is  not 
restrictive.  Its  only  purpose  is  to  guarantee  that  the  innovation  distribution  does  not  have 
all  its  mass  concentrated  at  zero. 

Remark  2.  It  can  be  checked  that  processes  in  the  ARCH,  GARCH,  EG  ARCH  and 
stochastic  volatility  class  satisfy  the  assumptions,  provided  the  parametrization  implies 
that  Eel  <  oo.  It  is  well  known  from  Milhoj  (1985)  or  Nelson  (1990)  that  this  condition 
is  satisfied  only  if  additional  restrictions  limiting  the  temporal  dependence  of  conditional 
variances  and/or  the  innovation  distribution  are  imposed  on  the  parameter  space. 

By  definition  of  the  conditional  expectation  operator,  <rt  is  Tt-\  measurable.  As- 
sumption (Al)  implies  that  e2  is  strictly  stationary  and  ergodic  and  therefore  covariance 
stationary.  It  should  be  emphasized  that  no  assumptions  about  third  moments  are  made. 
In  particular  this  allows  for  skewness  in  the  error  process. 

For  the  special  case  of  an  ARMA(p,  q)  process,  the  lag  polynomial  has  the  familiar 
rational  form 

C(/M  =  ^  (2.4) 

with  9(z)  =  1-01-z-.. .  -0qzi  and  4>{z)  =  \-faz-. .  .~4>pzP  and/?'  =  {^,...,^,6^  ...,6q). 
Let  gyy(P,X)  =  |C(/3,  elA)  |    where  \z\  =  (zz*)1'    for  z  G  C  and  z*  is  the  complex  conjugate 

2 

of  z.  Under  Assumption  (Al),  the  spectrum  of  yt  is  given  by  fyy((3,\)  =  %^9yy{P,  A). 

Further  restrictions  on  C(0,  elX)  are  needed  to  insure  identification  of  the  model  and  for 
consistency  and  asymptotic  normality  of  the  estimators.  The  necessary  assumptions  are 
discussed  in  Hannan  (1973),  Dunsmuir  and  Hannan  (1976),  and  Deistler,  Dunsmuir  and 
Hannan  (1978).   As  shown  in  these  articles,  a  careful  distinction  between  convergence  of 


the  parameters  in  c(/3,j)  and  the  structural  form  parameters  is  needed.  Consistency  proofs 
typically  establish  convergence  in  the  pointwise  topology.  An  identification  condition  is 
then  needed  to  obtain  convergence  in  the  quotient  topology. 

Some  of  the  results  of  this  paper  are  presented  for  the  general  formulation  C((3,z). 
At  some  points  however  a  specialization  to  the  ARM  A  case  is  made  in  order  to  obtain 
sharper  results.  This  is  especially  the  case  for  the  consistency  proof.  In  that  case  abstract 
high  level  assumptions  can  be  made  precise  for  the  specific  functional  form  of  the  ARM  A 
model. 

In  the  general  case  the  functions  c(0,j)  G  C([Rd  xN],R)  are  restricted  to  satisfy  the 
following  additional  constraints. 

Assumption  Bl.  Let  C(0,  z)  =  Yl'jLo  c{PiJ)zJ ' ■  The  parameter  space  0  is  a  subset  ofM.d 

defined  by  0  =  {(5  g  Rd  \C(0,z)\~2  ±  0  for  \z\  <  1,  \C{p,z)\2  ^  0  for  \z\  <  1}.  Assume 

that  0  is  open  in  Mr.  Let  the  compact  closure  of  0  in  Rd  be  denoted  by  0.  Assume 
(30  G  0.  The  coefficients  c([3,j)  are  twice  continuously  differentiate  in  0  G  0  for  all  j  and 

c(/?,0)  =  1.  We  require  for  (5  G  0  that  £~0  |j|  \c(0,j)\  <  oo  and££o  \j\  \-^c(0,j)\  <  oo. 

Assumption  B2.  For  all  (3  G  0,  gyy{Po,  A)  ^  9yy(P,  A)  whenever  j3  ^  fiQ  for  some  subsets 
L  C  [— 7r,7r]  with  nonzero  Lebesgue  measure.  Let  8Q  =  0\0  and  consider  any  convergent 

sequence  /3n  €  0,  0n  -»  P  G  90.  Then  liminfn  J\  C-\pn,e~lX)f(eiX)d\  >  0  for  some 
complex  valued  f(z)  such  that  f(z)  =  EfcL-oo  hzk  with  Y2T=-oc  IAI  <  °°- 

Assumption  B3.  For  a  neighborhood  U  of  /30,  U  C  ©o,  d2gyy(P,  X)/dpdp  is  continuous 
in  A  G  [-7r,7r]  and  0  €  U. 

Remark  3.  Assumption  (Bl)  implies  that  the  functions  gyy(P,X)  and  dgyy(p,\)/dp  are 
Lipschitz  continuous.  The  Lipschitz  condition  also  implies  that  g~y(P,X)  is  Lipschitz 
continuous  on  closed  subsets  ofQ  and  therefore  that  -^  \ngyy(P,  A)  is  Lipschitz  continuous 
on  closed  subsets  ofQ. 

Remark  4.  Assumption  (Bl)  is  stronger  than  C2.2  in  Dunsmuir  (1979)  where  on  the  other 
hand  conditional  homoskedasticity  is  assumed.  The  stronger  summability  restrictions  are 
needed  to  justify  approximations  based  on  the  innovation  sequence. 

The  assumptions  specified  here  are  sufficient  to  identify  the  parameters  P  in  C(P,  elX). 
For  specific  functional  forms  of  C(P,  elX)  the  assumptions  can  be  made  more  explicit.  A 
leading  example  is  the  ARM  A  model  where  the  identifiable  subset  of  Rd  can  be  described 
more  accurately.  The  following  Assumption  is  equivalent  to  the  previous  assumptions  for 
the  case  of  an  ARM  A  model. 

Assumption  B4.  Let  C(P,z)  =  9  (z)  /4>(z).  The  parameter  space  0  is  a  subset  of  Rd 
defined  by  0  =  {0  G  Rd  \4>{z)  ^  0  for  \z\  <  1,  9(z)  +  0  for  \z\  <  1 , 6  (z) ,  <j>(z)  have  no 
common  zeros,  6q  ^  0,  4>p  ^  0}.  Let  the  compact  closure  ofQ  in  Rd  be  denoted  by  0. 


Remark  5.  Deistler,  Dunsmuir  and  Hannan  (1978)  show  that  Q  and  0  defined  in  As- 
sumption (B4)  satisfy  the  topological  properties  required  in  Assumption  (Bl).  It  is  easy 
to  show  that  all  ARMA  models  in  6  satisfy  the  summability  and  differentiability  require- 
ments of  (Bl).  The  only  new  condition  is  liminf„    /^  C~1  (/3n,  e~iX)  f(elX)d\    >  0.  Since 

C(P,  elX)  can  be  zero  on  the  boundary  of  the  parameter  space  we  can  not  expect  the  inte- 
gral to  be  defined  on  the  boundary  in  general.  The  condition  requires  that  the  behavior  of 
C~l(f3n,elX)  is  not  too  irregular  as  j3n  — >  /?  6  dO.  For  the  ARMA  class  this  condition  is 
satisfied.  It  is  enough  to  consider  the  MA(q)  case.  The  integral    J*n  8n(e~lX)~1  f(elX)d\ 

diverges  to  infinity  as  more  than  one  of  the  roots  of  6n(elX)  approach  unity  and  converges 
to  a  constant  if  one  root  approaches  unity.  To  see  this  let  £-n  denote  the  roots  of  6n(elX) 

such  that  Bn(e*x)  =  n?=i(l  -  ZjnelX)  ™th  ^(e^)"1  =  nj=i  E£=o^Afc'  and  f^)  = 

£r=-ooAeiAfc  which  leads  to  f_Jn{e-iX)^f(eiX)d\  =  E£=o  ■••££=o^--&/V 

In  the  following  analysis  of  the  IV  estimator  results  will  first  be  obtained  for  the  general 
linear  process  case.  It  will  then  be  shown  that  high  level  assumptions  needed  for  these 
results  are  satisfied  for  the  case  when  Assumptions  (B1-B3)  are  specialized  to  (B4). 

3.  Instrumental  Variables  Estimators 

In  this  section  a  class  of  instrumental  variables  estimators  is  introduced.  The  instruments 
are  constructed  from  linear  filters  of  lagged  innovations  et.  An  alternative,  equivalent 
formulation  would  be  to  allow  for  linear  filters  of  the  observable  process  yt-  Estimators 
of  this  form  have  been  proposed  by  Hayashi  and  Sims  (1983),  Stoica,  Soderstrom  and 
Friedlander  (1985)  and  Hansen  and  Singleton  (1991). 

Restricting  the  instruments  to  the  linear  class  has  implications  for  the  efficiency  proper- 
ties of  the  estimators.  It  rules  out  conditional  GLS  transformations  and  ML  estimators  for 
parametric  cases.  Linearity,  on  the  other  hand,  leads  to  a  tractable  theory.  Introduce  the 
space  of  absolutely  summable  sequences  ll  such  that  x  G  I1  if  ]P  \xj\  <  oo  for  x  =  {xj}°°_1  . 
Define  the  set  A  of  sequences  of  vectors  clj  G  Rd  such  that 

A  =  ia=  {aj}f=l  :  a;  G  Rd,  {[oj]*.}^  G  I1  for  all  1  <  k  <  d\ 

where  [.]fc  denotes  the  k-th  element  of  a  vector.  We  define  zj  G  Rd  as 

oo 

zt  =  ^akCt-k  a.s. 
fc=l 

for  o  G  A,  a  fixed.  The  instruments  satisfy  the  orthogonality  condition 

E[(C-1((30,L)yt)zt}=0  (3.1) 

since  C_1(/30,L)yt  =  et  from  (2.1).  The  estimator  based  on  this  condition  is  constructed 
in  the  time  domain.   If  C-1(/?0,L)  is  of  infinite  order  as  is  the  case  for  MA(q)  models  a 


sample  analog  to  (3.1)  needs  to  be  based  on  an  approximation.  Such  an  approximation 
can  be  conveniently  analyzed  in  the  frequency  domain.  It  should  be  stressed  however  that 
the  estimator  is  set  up  in  time  domain.  Let  the  expansion  of  the  polynomial  C_1(/3,z)  be 
C_1(/3, z)  =  Y1T=qCj z'j ' ■  The  sample  analog  of  the  moment  restriction  is  then  given  by 


n         t-1 

CUM  =  ~I>5^Vt-j  (3-2) 


(=1        j=0 


for  all  a  €  A.  From  (3.1)  we  see  that  zt  has  to  be  approximated  as  well.  Discussion  of  this 
issue  will  be  delayed  to  Section  5  where  an  optimal  instrument  is  considered.  For  the  time 
being  it  is  therefore  assumed  that  zt  is  known. 
In  the  frequency  domain  the  analog  of  (3.1)  is 

P  C-l(l3o,e-iX)fyZ(X)dX  =  0 

J  —  IT 

where  fyz(X)  =  Yi'jL-oo  lyzHY*3  and  lyziJ)  =  Eytzt-j.  We  set 

G(P,a)  =  (2ir)-1   f  C"1^,  e^x)fyz(X)dX. 

J  —IT 

Note  that  fyz(X)  typically  is  a  complex  vector  valued  function  fyz(X)  :  [— 7r,7r]  — >  Cd.  Also 
note  that  f*nC~1((3,elX)fyZ(X)dX  is  real  valued. 

We  introduce  discrete  Fourier  transforms  of  the  data  defined  as  unty(X)  =  -4=  ^"=i  Vte~ltX 
and  for  the  instrument  as  oJn,z(X)  =  -4=  Y^t=i  Zte~ltX-  The  cross  periodogram  is  In,yz(X)  = 
^>n,y(X)con,z(—X).  It  is  easy  to  check  that  Gn{(3,a)  defined  in  (3.2)  is  identical  to 

Gn((3,a)  =  (27T)"1   r  Crl{f3,e-lX)In,yz{X)dX. 

J  —  IT 

We  follow  Hansen  (1982)  in  defining  the  estimator  f3n  as  the  solution  to 

/3n=argmin||Gn(M||2.  (3.3) 

/3e0 

Consistency  arguments  are  complicated  by  the  fact  that  the  parameter  space  for  linear 
time  series  models  usually  is  only  locally  compact.  Standard  consistency  proofs  relying  on 
compactness  can  therefore  not  be  applied.  Hosoya  and  Taniguchi  (1982),  Kabaila  (1980), 
Taniguchi  (1983)  are  assuming  compactness  of  the  parameter  space  to  avoid  consistency 
problems.  Such  an  assumption  is  not  valid  in  the  ARMA  case.  Stationarity  restrictions 
imply  that  0  is  an  open  subset  in  M.d  as  was  shown  by  Deistler,  Dunsmuir  and  Hannan 
(1978). 

Huber  (1967)  probably  is  the  first  reference  to  discuss  consistency  of  A/-estimators 
when  the  parameter  space  is  not  compact.  The  formulation  there  is  in  terms  of  low 
level  assumptions  on  the  criterion  function  and  the  data  generating  process  which  are 
not  readily  adaptable  to  the  present  situation.  Hannan's  (1973)  original  paper  provides  a 


consistency  proof  for  the  estimators  of  an  ARM  A  model  without  assuming  compactness. 
Unfortunately,  his  technique  for  the  Gaussian  estimators  does  not  readily  generalize  to 
the  current  context.  General  consistency  results  are  obtained  by  Wu  (1981),  Pakes  and 
Pollard  (1989)  and  Zaman  (1989).  The  stochastic  equicontinuity  arguments  underlying 
these  proofs  are  not  applicable  in  our  context  due  to  the  discontinuities  of  the  criterion 
function  on  the  boundary  of  the  parameter  space. 

One  of  the  problems  is  that  the  criterion  function  does  not  necessarily  converge  on  the 
compactification  9.  The  consistency  proof  used  here  therefore  proceeds  by  establishing 
almost  sure  bounds  for  the  criterion  function  along  convergent  sequences  in  0.  It  is  then 
possible  to  circumvent  uncertainty  by  analyzing  convergent  subsequences  on  an  outcome 
by  outcome  basis.  This  method  was  used  by  Brockwell  and  Davis  (1987,  p. 384)  to  prove 
consistency  for  estimators  for  the  ARMA  model  based  on  quadratic  criterion  functions. 
The  details  of  their  proof  rely  heavily  on  nonegativity  properties  of  quadratic  forms.  For 
the  IV  estimators  considered  here  such  arguments  are  not  available  and  a  new  proof  is 
presented.  We  start  by  making  the  following  assumptions.  Unless  otherwise  stated  all 
conditions  are  for  a  e  A,  a  fixed. 

Assumption  Cl.   The  sequence  of  estimators  (3n  G  M.d  is  defined  by  (3.3). 

Assumption  C2.  Let  the  sets  Bk((30)  for  k  =  1,2, ...  form  a  countable  local  base3  around 
P0.  The  sets  Bic(P0)  can  be  taken  as  the  set  of  balls  with  rational  radius  centered  at  (30. 
Let  zt  =  Y^h=i  ak£t-k  o..s.  where  et-k  satisfies  Assumption  (Al).  Let  A'  C  A  be  the  set  of 
all  sequences  {afc}^L1  such  that 


A'  =  I  a  €  A 


inf  liminf      \\G(pn,a)\\  >  0  for  k  =  1,2, . 


where  Bk(Po)c  are  the  complements  of  Bk(Po).  Assume  that  A1  ^  0. 

Remark  6.  Assumption  (Cl)  is  the  definition  of  the  estimator.  We  show  in  the  consis- 
tency proof  that  \\Gn(Pn,a)\\  =  0  almost  surely  is  implied  by  the  assumptions  on  Gn. 
(C2)  is  a  familiar  identification  condition  which  makes  sure  that  the  expectation  of  the  cri- 
terion function  is  bounded  away  from  zero  outside  a  neighborhood  of  the  true  parameter. 
However  this  condition  does  not  hold  for  all  a  £  A.  We  therefore  define  the  subset  A'  of 
instruments  that  satisfy  the  identification  condition.  We  require  that  this  set  be  nonempty. 
Condition  (C2)  strengthens  Assumption  (B2)  by  requiring  that     liminf      ||G(/?„,a)||  >  0 

holds.  Condition  (C2)  requires  in  addition  that  the  identification  condition  holds  on  the 
entire  parameter  space.  This  imposes  restrictions  on  zt  or  a.  A  complete  description  of  the 
set  A'  is  possible  for  a  given  parametric  class  C(/3,  z).  A  characterization  will  be  given  for 
the  ARMA  case. 


3  A  collection  of  open  subsets  B  of  a  space  X  is  called  a  base  if  for  each  open  set  O  C  X  and  each  x  £  O 
there  is  a  set  B  €  B  such  that  x  6  B  C  O.  A  collection  Bx  of  open  sets  containing  a  point  x  is  called  a 
local  base  at  x  if  for  each  open  set  O  containing  x  there  is  a  B  G  Bx  such  that  x  e  B  C  O.  Every  metric 
space  has  a  countable  base  at  each  point  (see  Royden  (1988),  p.   175). 


Lemma  3.1.  Assume  (Al),  (B1-B3),  (C1-C2).  Let  zt  =  lim™-^  Amef  a.s.  with  Am  = 
[a\,  ...,am]  ,  {a.fc}^=1  £  A'  and  e™  =  [et-i, .  ■ .  ,£t-m]  ■  Then  the  estimator  defined  by 
/3n  =  argmin  ||Gn(/3n)||    is  consistent,  (3n  —*  (30  almost  surely. 

Consistency  of  the  IV  estimator  depends  both  on  restrictions  on  the  parameter  space 
and  the  instruments  zt.  Assumption  (C2)  restricts  the  class  of  allowable  instruments.  The 
conditions  given  are  necessarily  high  level  without  further  restrictions  on  the  function 
C(P,L).  For  practical  purposes  it  is  however  important  to  characterize  the  set  of  instru- 
ments A'  leading  to  consistent  estimators.  In  the  case  of  an  ARMA(p,q)  model  it  is 
possible  to  give  conditions  on  the  sequences  a  £  A! .  This  is  done  in  the  next  proposition. 

Proposition  3.2.  Assume  C(@,L)  =  9o(L)/4>0(L)  is  an  ARMA(p,q)  lag  operator  and 
the  parameter  space  G  satisfies  Assumption  (B4).  Let  S  =  sp  {x  £  ll  :  4>Q(L)x  =  0}  be 
the  span  of  linearly  independent  solutions  to  the  difference  equation  <f>0(L)x  =  0.  Define 
AL  =  [x  £  I1  :  A'x  =  0}  for  A  =  [ax,...]  and  a  £  A.  If  a  £  A  with  Ad  =  [a1;  ...,ad}'  where 
d  =  p  +  q  then  the  following  conditions  are  sufficient  for  a  £  A' .  If  q  >  p  >  0  and  Ad 
nonsingular  and  YLkLi  ak  ¥"  0  then  a  £  A! .  IfO  <  q  <  p  then  we  need  A  =  [a1; ....]  to  be  of 
full  row  rank,  AL  n  S  =  0  and  *£%Ll  Q*  +  °  for  fl^' 

Remark  7.  Lemma  (3.2)  shows  that  ARMA  models  can  be  consistently  estimated  by 
instrumental  variables  techniques  provided  that  the  instruments  satisfy  the  specified  re- 
strictions. The  condition  Y1T=\  ak  ¥"  0  is  only  needed  to  avoid  problems  at  the  boundary 
of  the  parameter  space  and  can  be  ignored  if  Q  is  restricted  to  a  compact  subset  ofW*. 

We  now  state  additional  assumptions  that  are  sufficient  to  establish  a  result  for  the 
limiting  distribution  of  ^((3n  —  (30).  Introduce  the  notation  f](0,  A)  =  d\nC(p,e~tX)/d/3 
and  6fc  =  (2n)~1  J  f](p0,X)elkXd\.  It  follows  immediately  that  6_fc  =  0  and  bo  =  0.  Let 
^a(A)  =  Y^k=iake~lXk  and  define  the  matrices  Pm  =  [b\,  ...,bm],  A'm  =  [ai,...,am]  and 


«  £m.  — 


a(l,l)+aA     ■■■     <r(l,m) 


<r(m,  1)  •■•     a(m,  m)  +  a4 


(3.4) 


It  is  easy  to  check  that  limm  P'mAm  =  (2ir)    x  j  rj(P0,  X)la(—X)'dX.  The  following  conditions 
are  needed  to  prove  the  existence  of  a  limiting  distribution  of  /?„. 

Assumption  Dl.  ^/nGn(0n,a)  =  op  (1)  . 

Assumption  D2.  Define  A"  C  A  as  A"  =  {a  £  A  |det  J  f](P0,  X)la(-X)'dX  +  0}  .  As- 
sume that  A'  n  A"  /  0. 

The  limiting  distribution  of  the  instrumental  variables  estimator  is  stated  in  the  next 
theorem.  For  notational  efficiency  define  limm_IX)<7-4(PmPm)-1Pmf2m.Pm(.PrnPm)_:l'  = 
o~A(P' A)~l A€lA(A  P)~l .  This  notation  will  be  justified  in  the  next  section  in  terms  of 
operators  on  infinite  dimensional  spaces. 


Theorem  3.3.  Assume  (Al),  (B1-B3),  (CI,  C2)  and  (Dl,  D2).  Let  zt  =  limm_00  A'me™ 
with  A'm  =  [ai,...,am]  ,  {afc}^  G  -4'n.4"  and  e™  =  [et-\, . . .  ,e{_m]  .  Then  the  estimator 
defined  by  j3n  =  argmin  ||G„(/3n)||    has  a  limiting  distribution  given  by 

M0n  -  Po)  ±  N{0,a-\P'A)-1X{IA{A'P)-1) 

Proof.  See  Appendix  B  ■ 

Remark  8.  If  (5n  is  obtained  from  minimizing  a  Gaussian  PML  criterion  function  then  the 
asymptotic  covariance  matrix  is  a~4(P  P)~lP  QP(P  P)~l .  Such  an  estimator  therefore 
corresponds  to  an  IV  estimator  where  A  =  P.  This  shows  that  Gaussian  estimators  have  the 
interpretation  of  inefficient  TV  or  GMM  estimators  when  the  innovations  are  conditionally 
heteroskedastic. 

The  main  result  of  the  paper  will  now  be  developed  in  two  steps.  We  first  obtain  a 
lower  bound  for  the  covariance  matrix 

a-4(PA)-lA'flA(A'P)-1  (3.5) 

in  the  next  section.  This  lower  bound  is  then  used  to  construct  an  optimal  instrumental 
variables  estimator. 

4.  Covariance  Matrix  Lowerbound 

Finding  a  lower  bound  for  (3.5)  poses  certain  technical  difficulties  having  to  do  with  the 
infinite  dimensional  nature  of  the  instrument  space.  We  investigate  the  properties  of  the 
fourth  order  cumulant  matrix  fim,  first  by  holding  m  fixed  and  then  by  looking  at  a  re- 
lated infinite  dimensional  problem.  In  particular  we  establish  that  the  infinite  dimensional 
operator  Q,  associated  with  Qm  in  a  way  to  be  defined,  has  a  well  behaved  inverse. 

We  first  discuss  the  properties  of  f2m  for  all  fixed  m.  This  is  done  in  the  next  Lemma. 

Lemma  4.1.  Let  Qm  be  defined  as  in  (3.4).  Then,  fi"1  exists  for  all  m. 

Proof.  See  Appendix  B  ■  *■' 

Invertibility  of  Qm  for  all  m  however  is  not  enough  to  show  that  Q  is  invertible.  We 
briefly  review  the  theory  of  invertible  operators  (see  Gohberg  and  Goldberg  (1980),  p. 65. 
For  two  Banach  spaces  B\  and  B2  denote  the  set  of  bounded  linear  operators  mapping 
B\  into  £?2  by  L(B\,B2).  Then  A  €  L(B\,B2)  is  invertible  if  there  exists  an  operator 
A~l  G  L(B2,Bi)  such  that  A~lAx  =  x  for  all  x  €  Bx  and  AA~ly  =  y  for  all  y  G  B2.  Let 
KerA  —  {x  G  B\  :  Ax  =  0}  and  Imi  =  {Ax  :  x  G  B{\  .  Then  A  is  invertible  if  KerA  = 
{0}  and  \mA  =  B2. 

Following  Hanani,  Netanyahu  and  Reichaw  (1968)  we  now  choose  B\,B2  as  linear 
spaces  whose  points  are  sequences  of  real  numbers  denoted  by  x  =  {xi,x2,---}  and  y  = 
{j/ii 2/2,  ---}  -  Define  the  norm  ||x||2  =  Q^il^il  )a^2-  Then  B  is  the  space  of  all  sequences 
that  are  bounded  under  the  ||.||2  norm  and  is  denoted  by  I2.  An  operator  A  :  l2  1— >  I2  is 
defined  by  the  infinite  dimensional  matrix  A  =  (dij),i,j  =  1,2, ....  such  that  y  =  Ax  G  I 
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for  all  x  G  I2.  This  can  be  written  element  by  element  as  yi  =  Y1T  ai,jxj  f°r  an  *•  The 
operator  A  is  invertible  if  the  only  solution  to  Ax  =  0  is  x  =  {0,0, ....}  and  Im^l  =  I2. 
Note  that  I2  is  a  Hilbert  space  with  inner  product  (x,y)  =  ^°°  Xjyj.  From  Theorem  11.4 
in  Gohberg  and  Goldberg  (1980)  it  follows  KerAL  =Imi  for  a  self  adjoint  operator  A. 
It  is  thus  enough  to  show  KerA  =  0  for  A  :  I2  — >  /2,  A  selfadjoint. 

Consider  now  the  following  infinite  dimensional  operator  associated  with  Qm.  Define 
the  operator  Q  component- wise  by  its  image  for  all  x  G  I2  by  b{  =  limm_oo  Y1T  ai,jxj 
where  otij  is  defined  in  (2.3).  In  other  words  Q,  is  the  infinite  dimensional  matrix  such  that 
any  left  upper  corner  sub  matrix  of  dimension  m  x  m  has  the  same  elements  as  Qm.  We 
use  arguments  similar  to  the  ones  in  the  proof  of  Lemma  (4.1)  to  establish  invertibility. 

Lemma  4.2.  Let  fim  be  defined  as  in  (3.4).  Then  Q  G  L(l  ,l2)  and  f7_1  exists. 

Proof.  See  Appendix  B  ■ 

Remark  9.  The  fact  that  the  image  of  Q  is  square  summable,  i.e.  fix  G  I2,  depends  on 
the  summability  properties  of  a(k,l).  The  interpretation  of  the  summability  condition 
is  that  the  instruments  et  become  unrelated  in  their  fourth  moments  as  the  time  spread 
between  them  increases. 

By  the  Closed  Graph  Theorem  (Gohberg  and  Goldberg  (1980),  Theorem  X.4.2)  it  also 
follows  that  O-1  is  bounded,  i.e.,  ||fi_1||  =  suP||x||  <i  ||^~1;r||2  "^  °°-  Thus  sup,  •  \uJij\  <  oo 
where  [fi-1]ij  =uj1j. 

Next,  we  need  to  establish  properties  of  the  matrix  fi"1  as  m  tends  to  infinity.  In 
particular  we  want  to  establish  that  the  inverse  fi^1  approximates  fi_1  as  m  — >  oo. 


Lemma  4.3.  Let  flm  be  as  defined  in  (3.4).    Define  Cl^    such  that  Q^Clm  =  Im  and 

nm   0 


tlm^m    =  Im  Vm.  Let 


wm  = 


(4.1) 


0  (T4I   j 

Oas  m  — >  oo. 


where  I  stands  for  an  infinite  dimensional  identity  matrix.  Then  Q^  J  exists  and  II  £7^  : 


Proof.  See  Appendix  B  ■ 

Remark  10.  Lemma  (4.3)  provides  an  algorithm  to  approximate  the  infinite  dimensional 
inverse  fi_1. 

We  define  the  d  dimensional  product  of  sequence  spaces  l\  =  I2  x  ...  x  I2.  Define  the 
infinite  dimensional  matrix  P  =  [b\, ...]  by  stacking  elements  of  the  sequence  {&fc}^=i  £  ^5- 
Introduce  notation  for  the  reverse  operation  of  extracting  a  sequence  form  the  rows  of  a 
matrix  by  defining  b(P)  :=  {bk}^=1  .  Define  the  matrix  2  =  (P'n-1P)_1. 

Using  this  notation  we  can  state  our  next  theorem  which  establishes  a  lower  bound  for 
the  covariance  matrix. 
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Theorem  4.4.  For  any  a  €  A  let  A'  =  [ai,...]  and  P  and  Q  as  previously  defined.  If 
a{P'A)  e  A"  then  the  matrix  (P' A)~1A'VlA{A '  P)~l  satisfies 

{P'A)-lA'£lA{A'P)-1  -  {P'n^P)-1  >  0 

where  >  0  stands  for  positive  semi-definite. 

Proof.  See  Appendix  B  ■ 

Remark  11.  If  a  G  A' DA"  then  {P'  A)~l  AQ.A{A'  P)~l  is  the  asymptotic  covariance  ma- 
trix of  an  estimator  based  on  a.  However,  it  is  important  to  point  out  that  the  lowerbound 
is  for  IV  estimators  in  the  class  of  all  instruments  which  are  linear  functions  of  the  innova- 
tion process  and  have  an  innovation  filter  in  A" .  The  construction  of  the  lower  bound  does 
not  involve  consistency  restrictions  for  the  instruments.  In  order  to  construct  an  efficient 
estimator  in  practice  it  has  to  be  established  that  the  optimal  instrument  does  in  fact 
satisfy  consistency  restrictions. 

5.  Optimal  Instrumental  Variables  Estimators 

Theorem  (4.4)  immediately  leads  to  the  construction  of  an  efficient  IV  estimator.  The 
optimal  instrument  is  determined  by  the  linear  filter  A1  =  P'Q~l .  It  is  not  a  priori  true  that 
the  optimal  filter  also  results  in  a  consistent  estimator.  However  for  important  parametric 
examples  such  as  the  ARM  A  class  this  is  indeed  the  case. 

Theorem  5.1.  Assume  C{0,L)  —  9{L)/(f){L)  and  the  parameter  space  Q  satisfies  As- 
sumption (B4).  If  A  =  P'fi-1  then  the  sequence  a  —  a^P'Q*1)  defined  by  the  rows  of  A 
satisfies  a  €  A'  (~l  A".  We  will  write  a(A)  €  A'  n  A". 

Theorem  (5.1)  together  with  Theorem  (3.3)  and  Theorem  (4.4)  establish  that  the  IV  es- 
timator for  the  ARMA  model  constructed  with  instruments  satisfying  A'  =  P'Sl-1  achieves 
a  lowerbound  of  the  same  type  as  in  Hansen  and  Singleton  (1991)  but  under  the  weaker 
martingale  difference  sequence  assumptions  on  et  detailed  in  Assumption  (Al). 

Feasible  versions  of  the  optimal  IV  procedure  have  to  be  based  on  approximations 
of  the  optimal  instrument  zt.  Such  approximations  replace  unobserved  et  by  observed 
residuals  it  =  yt  —  ]Cj=i  c(0O'J)yt-j  for  t  =  1,  ...,n  where  e\  =  y\.  Feasible  versions  of  et 
are  obtained  by  substituting  f30  for  a  first  stage  consistent  estimator  0.  Gaussian  PMLE 
procedures  which  are  consistent  but  inefficient  in  our  context  can  be  used  to  generate  first 
stage  estimators. 

Instruments  are  then  given  by  zt  =  X^=i  &j£t-j-  The  empirical  analog  of  the  moment 

restriction  now  becomes 

t-i 


Gn(/3,a)  =  -^it^cfyt_J,  (5.1) 

11  t=l      j=0 

An  algebraically  equivalent  formulation  of  (5.1)  is  given  by 

Gn(0,a)  =  (27T)-1  r  C-\0,e-lX)h{(3Q,\)Inm{X)d\ 

J  —  IT 
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where  In,yyW  is  the  data  periodogram  and  the  filter  h(X)  :  [— n,n]  — >  C    is  defined  as 
Mi^o.A)  =  li,{-X)C-l{0o^iX)  with 


oo 


WA)  =  Va^. 


3=1 


The  coefficients  of  the  optimal  instrument  are  given  by 

oo 

dj  =  22  bk<^kj 
fc=l 

where  6fc  is  the  Fourier  coefficient  of  the  derivative  of  the  log  spectral  density  of  yt  and  u)kj 
is  the  kj-th  entry  of  the  inverse  Q~l.  The  b^  coefficients  have  simple  interpretations  in 
special  parametric  models.  In  the  case  of  an  AR(p)  model  for  example  they  are  equivalent 
to  the  impulse  response  function  and  can  therefore  be  computed  easily.  It  can  also  be 
noted  that  the  Gaussian  estimators  are  obtained  by  setting  a,j  =  bj . 

It  is  shown  in  Kuersteiner  (1997)  that  a  sufficient  condition  for  the  validity  of  the 
approximation  is  that  the  coefficients  of  the  instruments  satisfy 

oo 

X^'lNfcl  <°ofor  fc  =  l,...,d.  (5.2) 

3=1 

The  following  lemma  shows  that  under  strengthened  summability  restrictions  on  the 
fourth  order  cumulants  Condition  (5.2)  is  satisfied  for  the  optimal  instrumental  variables 
estimator  of  the  ARMA(p,q)  model. 

Theorem  5.2.  Assume  C(/3,L)  =  6(L)/4>(L)  and  the  parameter  space  0  satisfies  As- 
sumption (B4).  Strengthen  Assumption  (Alv)  to  ^2r=i  ^27=1  s  \a  (s>r)l  =  B  <  oo.  By 
symmetry  this  implies  £~  1  ]T~  1  r  \a  (s,  r)\  =  B  <  oo.  If  A  =  P'9.~l  then  a  =  a^'fT1) 
satisfies  (5.2). 

Feasible  versions  of  the  optimal  estimator  are  then  obtained  by  replacing  Gn(fi,a)  by 
Gn((3,a)  where  in  Gn(0,a)  we  replace  h(P0,X)  by  ^(A)C_1(/9,  elX)  and  f3  is  a  consistent 
first  stage  estimate.  The  challenging  part  is  to  estimate  l^(X)  consistently.  For  a  case  with 
additional  restrictions  on  the  moments  of  et  this  has  been  done  in  Kuersteiner  (1997). 
In  that  particular  case  it  is  possible  to  estimate  l^(\)  consistently  without  the  need  to 
introduce  bandwidth  or  truncation  parameters.  The  simplification  comes  from  the  fact 
that  in  that  particular  case  fi_1  is  diagonal  such  that  a,j  =  bj/ctjj. 

In  the  more  general  case  the  elements  u^j  can  be  estimated  from  a  sample  analog  of 
the  approximation  matrix  $1^  defined  in  (4.1).  Using  similar  arguments  as  in  Kuersteiner 
(1997)  it  can  be  shown  that  the  elements  of  this  matrix  can  be  uniformly  consistently 
estimated  as  long  as  m/y/n  — »  0.  From  Cl^  we  can  obtain  estimates  of  u>kj.  We  then  form 
the  truncated  estimate  of  the  a,j  coefficients  by  setting  a,j  =  5Zfc=i  ^k^kj- 

The  development  of  a  fully  feasible  estimator  requires  an  optimal  bandwidth  selection 
procedure  for  the  parameter  m.  This  is  beyond  the  scope  of  this  paper  and  will  be  left  for 
future  research. 
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A.  Appendix  -  Lemmas 

The  following  Lemmas  are  used  to  derive  the  asymptotic  distribution  of  the  IV  estimators. 
Lemma  A.l.   Under  Assumption  (Al)  for  each  m  G  {1,  2...}  ,  m  fixed,  the  vector 

1 


n 

E 

i=l 


£t£t-l,  ...,£*££- 


JV(0,fi) 


with 


&  'm  — 


ct(1,1)+ct4 


a(m,  1) 


a(l,rri) 


cr(m,m)  +  a4 


Proof.  We  note  that  individually  all  the  terms  etSt-k  with  k  >  1  are  martingale  dif- 
ferences. Now  define  Y{  =  [etet-i,  ...,etet-m]  -  Then  also  E{Yt  \  Tt-i)  =  0  so  that  Yt  is 
a  vector  martingale  difference  sequence.    To  show  that  -4=  Yl  Yt  =>  N(0,Q)  it  is  enough 

to  show  that  for  all  I  G  Rm  such  that  tl  =  1  we  have  -j^YL^Yt  =*  ^(0,  !)  where  now 
yt  =  Q_1/2ys  and  Q  =  EYtY(.  This  is  easily  evaluated  to  be 


'  'm    —  -E' 


2  -.2 


ere 


tct-i 


£*£t-i£t- 


efet-iej- 


2_2 


els 


t  -t   -m 


a(l,l)+a4 


aim,  1  ] 


cr(l,m) 


cr(m,m)  +  <r4 


Next  we  note  that  for  any  £  G  Mm  such  that  I  i  =  1,  ^  fixed,  £  Ft  is  a  martingale  by 
linearity  of  the  conditional  expectation  and  the  fact  that  m  is  fixed  and  finite.  We  can 
therefore  apply  a  martingale  CLT  (see  Hall  and  Heyde  ,  1980,  Theorem  3.2,  p. 52)  to 
the  sum  ^2tYnt  =  -4=  YltYt.  Checking  the  conditions  of  the  CLT  is  straightforward  and 
therefore  omitted.  ■ 

Lemma  A. 2.  Let  In^yz  (A)  =  u}n^(\)u>niZ(-X).  In<eE  (A)  is  the periodogram  of{e\, . . .  ,en}  . 
Assume  et  satisfy  Assumption  (A\)  and  that  yt  =  Y2T=o'xl;3£t-3  w^h  ^(A)  =  Sjio  V;je_lAj 
such  that  YITLq  \J\  l^jl  <  °°-  ^so  ^et  zt  =  Y^T=oaiet-j  with  a  €  A.  Let  q  (.)  be  a  function 
on  [— 7r,7r]  — »  C  with  absolutely  summable  Fourier  coefficients  {c/j,— oo  <  k  <  oo}  such 
that  q  (A)  =  Ejt-oo  Cje~iXj  ■  Then  for  any  n,e>0 


P     v^(27r) 


^  In,yz  (A)  c  (A)  dA  -   r  /n,£2  (A)  C(/?0,  AX  (A)  dX 


>V     <e 


asn-t  oo. 


Proof.  First  an  expression  for  i?n  (A)  =  In^yz  (X)—In,ez  (A)  ip(X)  is  obtained.  Let  u^j,  (A)  = 
n-i/2  ^™     yte~lXt  be  the  discrete  Fourier  transform  of  the  data.  Then 


un>y  (A)  =  zKAK,£  (A)  +  n-^J^iPj^Unj  (A) 


(a.i; 
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where  Unj  (A)  =  ET=l-j  £te~iXt  ~  ET=1  £«e_tAt  such  that 

oo 

Rn  (A)  :=  In,yz  (A)  -  </>(A)In,EZ  (A)  =  u>2  (-A)  n"1/2  ^^e~2Aj^  (A) 

j=0 

Note  that  (2tt)-1  J'  Rn  (A)  ?  (A)  dA  =  n~l  YlkLi  E£o  Et=i  Em=-oo  aki>ic™.£t-k  (er-l  -  £n-M-r) 
Then  using  the  Markov  inequality  it  is  enough  to  consider 


E\fn 


(2ir) 


-1 


i?„  (A)  ^  (A)  dA 


oo      oo         oo 


<2supa^/V1/2^^    J2    \ak^izm\\l\^0 


fc=l  /=0  m=-oo 

since  the  last  term  is  bounded  from  J2fc=i  \°k\  <  °°  and  H^o  Kl  l^/l  <  °°  " 

Lemma  A. 3.  Let  7ni£z  (A)  =  wn,e(A)u;n]Z(— A).    Assume  et  satisfy  Assumption  (j4l)and 
let  2t  =  J]~0  aj-et-j  with  a  <E  A  Then  for  any  ?  eRd  such  that  l'i=\, 


n 


1/2 


r*  (     °°  °° 

(27T)"1   /     £'Jn,e2  (A)  dX  A  TV    0,  J^afc/afcai 

•'—  T  \  l_i     ; 1 


/=1    fc  =  l 


Proof.  First  note  that  (27T)-1  j\  In>EZ  (A)  dA  =  n"1  ^=1  etzt  such  that  En1/2(2n)~1  [*n  Jn,ez  (A)  d, 
0.  It  also  follows  that  etZt  is  a  martingale  difference  sequence.  However  zt  =  Y2kL\  ak&t-k 
such  that  a  direct  application  of  Lemma  (A.l)  is  not  possible. 

For  a  fixed  m  we  introduce  zj1  =  YT=i  ak£t-k  such  that  lo™z{\)  =  n"1/2  J*=1  ztme-lAfc 
and  I™£z(\)  =  u>n^(X)u>™z(— A).  From  Billingsley  (1968,  Theorem  4.2)  it  is  enough  to  show 
that  for  all  e  >  0, 


lim    limsup  P 


»1/2  /     nC2(A)-/n,£2(A))dA 


>  e 


where 


/7f  Tl  OO 

£'(i^  (A)  -  /„,e2(A))dA  =  n-1'2  J2  E  ta#tet-k 

Since  Ea^etSt-k  —  0  it  is  enough  to  consider 


t=l  fc>m 


n       oo       oo 


n~ljEE  E^'afce'£t-fc)2  -n_1EEE  takafaw  -»Oasm-4  0o. 

t=l  fc>m  i=l  j>mk>m 

Applying  Lemma  (A.l)  then  gives  the  result.  ■ 

The  following  Lemmas  are  used  in  the  consistency  proof  to  show  that  the  criterion  func- 
tion is  non  zero  almost  surely  when  evaluated  along  any  convergent  sequence  of  parameter 
estimates  that  do  not  converge  to  the  true  parameter  value. 

Lemma  A. 4.  Assumption  (Bl)  implies  that  c(@,j)  =  (27r)-1  JC~1(P,  A)etAjdA  satisfies 
Zj\c(P,j)\  <coforall(3ee. 


15 


Proof.  Since  C'1^^)  =  C~l{(3,  -it)  it  follows  that 


\c((3,j)\=r1 


(27T)-1  fdC-1(0,X)/dXeiX^dX 


(A.2) 


From  dC~l{t3,  X)/d\  =  C~2((3,  X)dC{(3,  X)/dX  and  the  fact  that  C{0,  X)  satisfies  £  j  \c((3,j)\  < 
oo  it  follows  that  dC~1(P,X)/dX  has  absolutely  summable  Fourier  coefficients.  Rearrang- 
ing (A.2)  and  summing  over  j  then  gives  the  result. 

Lemma  A.5.  Assume  (Al),  (B1-B3),  (Cl,  C2).  Let  zt  =  limm_*oo  Xmef  with  A'm  = 
{au...,am},  {ak}^=l  €  A'  and  e?  =  [et-l,-  -  ■  ,£t-m]'  ■  Then  for  any  (3  G  0,  Gn(P,a)  -» 
G(/3,  a)  almost  surely. 

Proof.  Without  loss  of  generality  assume  that  zt  €  K.  Let  Eytzs  =  jyz(t  —  s),  and 
cum(yt,zs,yq,zr)  =  Kyyzz(t  —  s,t  —  q,t  —  r).  Then,  form  Assumption  (Al)  and  the  proof  of 
Theorem  2.8.1  in  Brillinger  (1981)  it  follows  that  ]T]  •  |7y2(j)|  <  oo  and  J^sqr  \Kyyzz(s>cl>r)\  < 
oo.  Let  Xn  =  Gn((3,a)  —  EGn{j3,a).  Since  EGn((3,a)  — >  G([3,a)  as  n  — >  oo  it  is  enough  to 
show  that  Xn  — >  0  almost  surely.  This  follows  from  verifying  the  conditions  of  Lemma  3 
in  Gaposhkin  (1980).  Using  the  short  hand  notation  Cj  =  c(/3,j)  we  have 


n         n         n         n 


EXl    =    "-'£££$:    [ly2{t-r)lyz{q-s) 

t=l  s=t+l  <j=l  r=t+l 
+"fyy(t  ~  <lhzz(r  ~  S)+  Kyyzz  {t  -  S,t  -  q,t  -  r)]  Cs-tCq-r 

<     K0n~2  Y  d2_t  =  K0-  V  (1  -  ^)c2  <  Km'1 

*■ — '   '  n  *■ — '  n     J 

s,t  j=—n 

2 

r) 


with  Kq   =    (J2T=-oc  UyzOn)     +  lZT=-oo  hyyU)\ET=~oo  hzzU)\  +  Ew  I  «!»«(*»  9 

and  K\  =  K0  Ej=-oo  °)-  Next  consider  E(Xn  -  Xni)2  for  n/2  <  na  <  n.  First 


n\        ni  n  n 

Xn  -  Xni  = V"  y  cs-t{ytzs  -  Eytzs)  +  n_1  V^    Y^  cs_t{ytzs  -  £:ytzs 

nni     z — '  z — '  z — '    z — ' 

5=1 t=s+\  s=n\  t=s+\ 


with 


n  n 


E     n_1  Yl    2   5«-*(Vt^  -  ^2/^)  I     <  Kon~2  J2    J2  °2s-t  <  Km-2{n  -  m] 

\  s=ni t=s+l  /  t=n\  s=t+l 

Since 

E  I  ^^  V   V  cs_t(ytz,  -  £j/tzs)  )     <  /<i (ni  2~  n)     <  A'!(7i-2(n  -  m)) 
\     h??,i     Z— '  ^— '  /  nln\ 

\  L      5=1  t=s+l  / 

it  follows  that  there  exists  a  constant  Ki  <  oo  such  that 

E{Xn  -  Xni)2  <  K2(n  -  m)n"2 
such  that  Xn  =  o(l)  almost  surely  by  Lemma  3  of  Gaposhkin  (1980)  ■ 
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Lemma  A. 6.  Assume  (Al),   (B1-B3),   (C1-C2).    Let  zt  =  lim™-^  Amef  with  Am  = 
[oi,  ...,am]  ,  {ofc}^!  G  -4'  ande™  =  [et-i,  •  ■  ■  ,£t-m]  ■  Then  for  any  convergent  sequence/3n  6 
9  with  pn  —>  /3  €  9  there  exists  an  event  E  with  probability  one  such  that  i)  if  (3  £  9  then 
for  all  outcomes  in  E,  Gn(/3n,a)  — >  G(P,a)  and  ii)  if  (3  €  9  then  liming  ||G„(/?n,a)||  >  0 
for  all  outcomes  in  E. 

Remark  12.  The  behavior  of  Gn(f3n,a)  as  Pn  approaches  the  boundary  depends  on  the 
sequence  (5n.  It  is  therefore  not  possible  to  describe  the  limit  of  Gn(Pn,a)  for  all  con- 
vergent sequences.  Possible  behavior  includes  convergence  to  a  constant,  divergence  as  a 
nonrandom  function  ofn,  convergence  to  a  limit  random  variable  which  can  have  unit  root 
or  near  unit  root  asymptotics  or  explosive  random  behavior.  All  we  need  to  show  however 
is  that  \\Gn(Pn,a)\\  stays  away  from  zero  for  large  enough  n  with  probability  one.  The 
idea  is  therefore  to  distinguish  between  random  and  nonrandom  limits  and  to  show  that 
nonrandom  limits  involve  constants  that  are  bounded  away  from  zero. 

Proof,  i)  For  each  e  >  0  there  exists  an  no  <  oo  and  8  >  0  such  that  for  n  >  uq 
\\l3n  -P\\<6  and 

sup     Bup|C~1G9,,A)-C-109,A)|<€ 

|/3'-/3|<5    A 

by  continuity  of  C-1(/3,  A)  at  f5  €  9.  For  /3  such  that  ||/3'  —  P\\  <  6  the  lag  polynomial 
C~l{j3' ,  z)  has  an  expansion  with  coefficients  c((3',j)  such  that  Yl'jLiJ  | c(/?',  j ) I  <  oo.  We 
will  use  the  short  hand  notation  c'  =  c(/3',j).  Without  loss  of  generality  assume  zt  £  K.  Let 
Xn(p)  =  Gn(p,a)-EGn(P,a)  and  define  Xn  =  sup,,  ,      ,,      \Xn{0)\  .  Since  EGn(p',a)  -» 

G(P',a)  and  |G(/?',a)  -  G(P,a)\  <  e  f  \fyz(X)\  d\  it  is  enough  to  show  that  Xn  — >  0  almost 
surely.  Thus  letting  Xn(j)  =  %%'{  ytzt+j  -  Jyz(-j) 

Xn  <       sup      n"1  ^  IgJI  \Xn(j)\  <  Kon-1  [  ^j"2  \Xn(j)\2  ' 
||/3'-/3|<«  j=0  \j=0 

1/2 


where  if0  =  sup,,  ,_,,(£~0 


j\.  From  lE'ATnl  <  (EX%)       we  consider 


Since 


EXl<K*n-*Y:r2(EXn(tf). 


3=0 


EXn{j)2  <nJ2\lyy(khzM+lyAkhyz(k)\+nY,\K4(3,k,l)\ 

j,k,l 


for  all  j  there  is  a  Xisuch  that  EX£  <  K2T1      where  K-i  =  ^K\Kq.  For  n/2  <  n\  <  n 


consider  Xnni  =  sup 


\\P  -p\\<6 


\Xn(p')  -  Xni(P')\  such  that 


ni 


1/2 


Xn,ni  K0(n  -  ni)  (nn,)-1  \  J^J'2  \Xni{j)\' 

J=0 
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n—j 


ii=0  \.t=max(n1-j,l) 


Now 


and 


2 


^(n-ni)2(nni)   2  E^Tj   2  \Xni(j)\2  <  K2(n  -  nx)n 

3=0 


2 
n— j 


3=0  y=max(ni—  j',1)  / 

together  with  £  (K  +  Z)2  <  £F2+2  (EY2EZ2)1/2+EZ2  implies  that  £X2ni  <  K2n~2(n- 
ni).  It  now  follows  from  Lemma  3  in  Gaposhkin  (1980)  that  Xn  — >  0  almost  surely.  The 
result  then  follows  since  e  was  arbitrary. 

For  part  ii)  we  show  that  lim  \\Gn(pn,a) ||  >  0  almost  surely  where  lim  :=  lim  inf„  .  By 
Fatou's  Lemma 

0  <  P(lim||Gn(/?n,a)||2  =  0)  <  hmP(||G„(/?n,a)||2  =  0). 

The  only  two  cases  that  need  to  be  considered  are  limV 'ar(Gn.(/?n,  a))  >  0  and  limV ar(Gn  (0n ,  a) ) 
0.  Let  c™  =  JC-l((3n,  X)elX]d\.  For  the  first  case  note  that  limVar(G„(0n,a))  >  0  implies 


such 


that  E"=o(^i/V")2  >  0  by  the  second  inequality  in  (A. 3).   Let  to?  =  c™/J]"=0 
that  for  Xn  =  i  E,  £S=i+j  utffoi-j*  -  7y,C?'))  we  have  EXn  =  0  and  £X2  =  O^"1) 
By  the  Toeplitz  Lemma  it  then  follows  that  Ylj=iw]'^yzU)  ~ *  Z}j=i7yz(i)  sucn  that 
££;«?  £E=i+;W-j*t    =  Op(l).    This  implies  that  Gn(f3n,a)  =  Op(Z]= 


.vhich 


leads  to  F(||Gn(/?n,a)||    <  e)  — >  0  for  any  e  <  oo.  For  the  second  case  one  needs  to  show 


1 2 


that  lim  \\EGn(/3n,a)\\    >  0,  since  for  any  e  >  0 

P(!|Gn(/3n,a)||2  <  e)  <  P(\\Gn((3n,a)  -  EGn{(3n,a)\\2  >  ||£G*„(/?n,a)||2  -  e) 

such  that  the  result  follows  from  the  Markov  inequality  after  taking  lim  on  both  sides. 
To  see  this  we  first  show  that  0  =  ]mVar(Gn(l3n,a))  <£»  \\mY™=Q(cy/y/n)2  =  0.  Now  <(= 
follows  immediately  from 

I  Ti—l  n—  1  .        n—1 

0  <  Var(Gn(Pn,a))  =  £(-  £c?X„(j))2  <  ^(^/V^)2-Ej^Xn(j)2  -  0.       (A.3) 

II  j=0  3=0  3=0 

To  show  =4>  assume  0  <  iim^"=0(c"/v/n)2  <  oo.  Then  there  exists  a  subsequence  nk 
with  nk  <  nk+i  and  k  —>  oo  such  that  0  <  YJj=o(^jk I ' \/^k)2  <  °°>  Var(Gnk((3nk,a)) 
exists  and  Var(Gnic(pnk,a))  >  0  V/c.  Note  that  Var(Gnk(0nk,a))  =  0  only  if  3a  6  I2 
such  that  yt  =  Ej=i  ajUt-j  almost  surely.  But  this  is  impossible  since  there  is  no 
x  €  I2  such  that  et  =  Yji=ixj£t-j  almost  surely  (see  the  Proof  of  Lemma  4.2).    This 
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contradicts  hmVar(Gn((3n,a))  =  0.  Therefore  liminf  ^IjLo^j/v^)2  =  0  implying  that 
lim  inf  \\EGn(0n,  a)  -  G(/3n,  a)  f  =  0  since 


1/2    /  \    1/2 

2   1  /   V~^    ■  M  /       .\||2 


\EGn((3n,a)-G(Pn,a)\\2<  (^jr\%\2\        I  £j  KzH 

\      j>n  J  \j>n 


At  the  same  time  lim  ||G(/3„,a)||    >  0  by  the  identification  assumptions  such  that  lim  ||£'GT,.(/3T1,a)||'' 
0.  This  shows  that  hjn.P(||Gn(/?n,a)||2  =  0)  =  0. 

B.  Appendix  -  Proofs 

Proof  of  Lemma  3.1  From  the  definition  of  (3n  it  follows  that 

0  <liminf  ||Gn(/?n,a)||2  <limsup  ||Gn(/?n,a)||2  <limsup  \\Gn(0o,a)\\2  .  (B.l) 

n  n  n 

From  Lemma  (A. 5)  it  follows  that  Gn(Po,a)  —>  G(/30,a)  =  0  almost  surely.  Thus 

limsup  ||Gn(/?n,a)||    =  lim  ||Gn(/?n,a)||    =  0  almost  surely.  (B.2) 

n    '  n 

Let  E  be  the  probability  one  event  in  Lemma  (A. 6).  Now  consider  the  sequence  /3n  G  0. 
If  Pn  does  not  converge  to  (3Q  then  by  compactness  of  0  there  exists  a  subsequence  /3n 

such  that  (3nk  — >  j3  G  O.  By  Lemma  (A. 6)  liminffc    Gnfc(/?nfc,a)||     >  0  a.s.  contradicting 
(B.2).  Therefore  /3n->P0.* 

Proof  of  Proposition  3.2  We  only  prove  that  Assumption  (C2)  holds.    We  first  note 
that  fyz(X)  =  C(f30,e-lX)la(-\)  where  la(X)  =  T,kLiak^zXk  such  that 


G(0,a)  =  {2*yl   r  i>((3,e-lX)la(\)d\. 

J— TV 


Letting 

V(/?o,e-a)  =  G-1(/?,e-a)G(/30,e-lA) 

it  is  clear  that  ip((30,e~lX)  =  1  such  that  G((3Q,a)  =  0. 

We  need  to  show  that  for  C(0,e~iX)  =  9(e'iX)/<p(e~iX)  there  is  no  other  j3  €  0  such 
that  G(P,  a)  =  0.  Now  for  any  /3  €  0  the  polynomial  tp(0,  e~lX)  is  rational  with  nominator 
and  denominator  degrees  equal  to  p  +  q.  The  orthogonality  conditions  can  now  be  written 
as 

(27T)-1  p  {i>{(3,  e-'A)  -  l)/a(-A)dA  =  0.  (B.3) 

J  —  7r 

We  want  to  show  that  the  only  function  tp(.,  e~%x)  —  1  :  [—it,  it]  — >  C  satisfying  this  condition 
is 

^(.,e-iA)-l  =  0.  (B.4) 

If  the  assumptions  of  Lemma  (3.2)  hold  then  the  only  value  0  for  which  ip((3,  e~lX)  —  1  =  0 
is  /?0- 
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Now  showing  that  ip((3,  e~lX)  —  l  =  0  is  equivalent  to  showing  (p(e~lX)9o(e~lX) / '4>0(e~tX)~ 
9(e~lX)  =  0  since  this  polynomial  6{e~lX)  is  not  zero  for  any  A  G  [— 7r,7r]  for  (3  G  0.  Here 
</>(e-tA)#o(e~tA)/^o(e~*A)  ^s  tne  ^aS  polynomial  of  an  ARMA(p,p  +  q)  process  with  para- 
meters <f)01,  ...^>o,P)  $ii  ■••>  ^p+9-  We  denote  the  impulse  response  function  of  this  process  by 
the  coefficients  ^  such  that  (p{e~lX)9o(e~lX) / '</>0(e_zA)  has  a  one  sided  Fourier  expansion 
Yl'jLo  i)jZ~%X:*  where  dependence  of  xjij  on  cp  is  suppressed  for  notational  efficiency. 

For  j  >  p  +  q  +  1  the  coefficients  ipj  satisfy  the  well  known  restriction 


i>i  -  ^o,i^'j-i  _  -  _  ^o.pV'j-p  =  o. 


(B.5) 


If  we  denote  the  infinite  dimensional  vector  [ipl, ...]  by  ip  and  the  vector  of  coefficients  in 
9{e~lX)  by  9  =  [9i,...,9q]'  then  Condition  (B.3)  has  a  matrix  representation 


a'xp 


[ai, 


■  -,a„ 


0. 


Establishing  (B.4)  amounts  to  showing  that  [ip1,...,%l>q]  —  9  and  [V'9+i> 

[V'g+i,  •••]   =  0  implies  [ipi,...,xpq]   =  9q  by  the  identification  conditions. 

[ai,...,aQ],  B  =  [aq+i,...]  and  x  =  xp  —  [9',0,...]'  such  that  Equations  (B.5)  and  (B.6)  can 

be  written  as  Rx  =  0  where 


(B.6) 

]   =  0  where 
Define  Aq  = 


R 


Aq 
0 


B 
R22 


and  R22  contains  the  coefficients  of  (B.5).  The  result  now  follows  if  R  is  of  full  rank. 
We  can  distinguish  three  cases.  If  p  =  0  then  R22  =  I  and  R  is  of  full  rank  if  Aq  is  of 
full  rank.  If  q  =  0  then  the  system  reduces  to  Bx  =  0  and  #22^  =  0  with  x  =  tp.  The 
condition  ^22^  =  0  is  satisfied  for  any  of  the  p  linearly  independent  solutions  to  (B.5).  It 
is  therefore  necessary  that  B  is  of  full  row  rank  and  B1  n  N(R22)  =  0,  where  ./V(i?22)  is 
the  p-dimensional  null  space  of  #22  and  B  is  the  orthogonal  complement  of  the  linear 
span  of  the  row  vectors  of  B.  Thus  the  only  solution  is  x  =  0.  Finally  if  both  q  ^  0  and 
p  ^-  0  then  we  need  to  distinguish  between  q  >  p  and  p  <  q.  First  consider  q  >  p.  Define 


D  = 


1 


y0,p-l 


such  that 


0 

1 

R  = 


and  C 


Ad    B 

c    b 


y0,p 


-<k 


0,1 


J0,P  J 


where  Ad  =  [ai,...,ad],  B  =  [ad+i,.-.},  C    -  [(0,C)' ,0, ...]  and  D  conformingly.  Note  that 
D  is  invertible  such  that  \R\  = 


l> 


It  can  be  checked  that  C'D~lB  =  0 


Ad-C'D-lB 

such  that  R  is  of  full  rank  if  Ad  is  of  full  rank.  On  the  other  hand  if  q  <  p  then  we  need 
again  that  [Aq,B}L  D  N{C,D)  =  0.  Then  there  is  no  element  x  G  N(C,D),  i/O  such 
that  [Aq,B]x  =  0  such  that  Rx  =  0  implies  x  =  0as  required. 

The  previous  arguments  can  break  down  on  the  boundary  of  9  because  it  is  possible 
that  xjj(P,  e~iX)  has  an  expansion  with  constant  coefficients.  In  that  case  /^  %p(@,  e~lX)la(-\)d\ 


■2U 


Kla(0)  for  some  constant  K.  We  therefore  need  to  require  /a(0)  7^  0  in  order  to  insure  that 


lim  infn 


/?»  C-l(Pn,  e-^)fyz(X)dX    >  0  for  /3n  e  G  and  0n  -  /?  e  99. 


Proof  of  Theorem  3.3  A  familiar  mean  value  expansion  leads  to 


op  (I)    =    {§pGn{f3n)]^iGn{(3n) 

=    (M  +  op(l))[VnG„(/30)  + 


<9A 


Gn(/£) 


v^(/3n-/30)]- 


where  M  =  a2(27r)-1  /^  dlnC(/30,e-lA)/d/3/a(A)dA  and  (/^-/?0)  =  op(l)  for  i  =  l,...,d. 
Here  -MfGn((3)  is  the  matrix  with  rows  jSfGn(P)'  for  i  =  1,  ...,d.  It  is  well  known  that 
the  multivariate  mean  value  expansion  can  be  made  exact  by  evaluating  each  row  -gjfGn(l3) 
at  a  different  point  j3ln. 

First,  convergence  of  gf-Gn  (/?«)'  to  M  can  be  shown  by  the  same  arguments  as  con- 
vergence of  G((3,  a)  noting  that  both  yt  and  zt  are  strictly  stationary  and  d  In  C{(3,  e~lX)/d(3 
is  uniformly  continuous  on  [— n,  n]  X  U  for  U  C  0,  U  compact,  (5Q  €  0.  A  set  C/  with  these 
properties  exists  by  local  compactness  of  the  parameter  space.  The  details  are  omitted. 

Next,  turn  to  -i/nGn(/30)  =  v/n(2^)"1  f*K  C-l{(30,e-lX)In,yz{\)d\.  From  Lemma  (A.2) 
it  follows  that 


v^  /     C-\f30,e-lX)In,yz{\)d\  -  v^  /     /„,«(A)dA  =  op(l). 


Using  Lemma  (A. 3)  then  shows  that  N/nGn(/30 
should  be  noted  that 


►  tf(0,E£i££=1«*.za*ai)  where  it 

00    00 


;=i  fc=i 


The  result  now  follows  from 


dlnC(f30,e-*x)/dP  =  ^Tbke 


—i\k 


fc=l 


such  that  (27T)"1 /^  (?lnC(/?0,  e-tA)/a/?/a(A)'dA  =  E^Li  &fc<- 

Proof  of  Lemma  4.1  First,  note  that  Qm  is  symmetric  since  a(k,l)  =  E(e2£t-k£t-i)  = 
a(l,k)  for  k,l  >  0,  k  ^  I.  Then,  by  the  Shur  decomposition  (see  Magnus  and  Neudecker, 
1988)  for  all  m  there  exists  an  orthogonal  matrix  Sm  such  that  SmQrnSm  =  Am,  where 
Am  is  diagonal  with  elements  A"1,  j  =  1,  ...,m.  Now,  for  any  xm  G  Rm,  xm  ^  0,  we  have 
xmQTnxTn  =  E(^2xi,m£tEt-i)2  >  0  where  the  inequality  is  strict  by  Assumption  (Al).  So 
Qm  is  positive  definite  such  that  A™  >  0  Vj,  m.  This  shows  that  fim  has  full  rank.  ■ 

Proof  of  Lemma  4.2  From  Assumption  (Al)  it  is  clear  that  Qx  6  I2  for  all  x  €  I2. 
It  remains  to  show  that  kerfi  =  0.  Assume  there  is  x  €  Z2  such  that  rr.  /  {0,0, ...}  and 
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2  =  0.  But  this  is 
0  a.s.  iiet£t-i  =  0 


Qx  =  0.  Then  also  x'flx  =  0  which  can  be  written  as  E(Y^Zi  xi£t£t 
only  possible  if  Y^  xletet-i  =  0  with  probability  one.  Now  ^  x%et£t-i 
a.s.  or  the  functions  et-i  are  linearly  dependent  a.s. 

If  st-i  are  linearly  dependent  then  3a  G  I2,  a  /  0  such  that  ^aiSt-%  =  0  a.s. 
Without  loss  of  generality  a\  =^  0.  If  ojj  =  0  for  all  i  =  2,3,...  then  Ylai£t-i  =  0  a.s. 
is  trivially  contradicted.  Now  assume  at  ^  0  for  at  least  one  i  =  2,3,....  such  that 
£t-i  =  -Qf1  YH^=2a^£t-i  a-s-  But  then  £J(£f_i|JFt_2)  =  -a^1  X^^2  ai£t-i  as.  so  that 
£7(ej_i ]-7-t_2)  7^  0  with  positive  probability.  This  contradicts  the  martingale  difference 
assumption. 

On  the  other  hand  if  etet-i  =  0  a.s.  for  all  z  then  e2e2_l  =  0  a.s.  But  then  E(e2e2_l)  =  0 
for  all  z  which  contradicts  Assumption  (Al).  Therefore  Qx  =  0  can  only  hold  if  x  =  0. 
Thus  kerfi  =  0.  Symmetry  of  f}  now  implies  that  Imfi  =  /2  therefore  Q_1  exists  and  is 
bounded  on  I2.  ■ 


Proof  of  Lemma  4.3  By  Assumption  (Al)  we  know  that  E^  ^  |c  (/c,/)|  <  ^  thus 
Y2k  \a  (^)OI  <  -^  f°r  any  '■  Therefore  for  any  fixed  I,  a  (k,l)  — >  0  as  /c  — >  oo.  This  holds 
also  if  the  roles  of  k  and  /  are  reversed.  Also  E^fc  |cr  (/c,  k)\  <  B  such  that  a(k,k)  — >  0 
as  A;  — >  oo.  Define  the  infinite  dimensional  matrices  5^,  5^  and  5^  according  to  the 
following  partition 


fi  = 


j  '771 

°21 


Then  ir(5^5f2  ! 
<j4J)    — >  0  as  m 


Eoo 


m+ 


i£ZLik(M)ls 


Cm, 
°12 
cm 
°22 


0,  ir(52"}52- 


0  and  tr(5g- ct4/)  (5^- 


00.  Then  define  the  infinite  dimensional  approximation  matrix 


Q.r 
0 


0 


Clearly  Vt*m  l  exists  Mm  by  Lemma  (4.1)  and  the  partitioned  inverse  formula.  We  now  have 

(fi-1-0^-1)  =  fi^-1(fi-^)Sl-1 

such  that 

llfi-1  -  Q*_1ll  <  llfT_1ll  \\Q  -  Q*  II  llQ_1ll 

777  -—  771  777  ' 

where  ||.||  is  the  matrix  norm  defined  by  \\A\\  =  supi|x|i  <x  ||j4x||2  .  First  show  that  Hf^"1!] 
is  bounded.  By  the  partitioned  inverse  formula 


a 


-i 


0 


0  a~4I 


such  that  \\Q 


*-H 


< 


Wl 


+  <t~4.  We  have  shown  in  Lemma  (4.1)  that  the  smallest 


eigenvalue  \™  of  Qm  is  nonzero.  Then  by  a  familiar  inequality  for  all  x  G  Rm  x  Q^x/x  x  < 
1/A™  <  00  Vm  such  that  H^m1!]  <  00  since  for  finite  m  all  norms  are  equivalent.  Also 
l|Q_1||  <  00  by  Lemma  (4.2)  and 


lift  -  ft; 


=      sup 

llxIKl 


y^  <j(k,i)xi 


1  ,n  1 1 


+ 


00 
E 

fc=m+l 


00 

E 

z=i 


a  (k,  I)  xi 


1/2 
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oo 


<  sup  Y^    y~"    \cr{k,  l)\  \xi\  4-  sup     2_\    /_.  \a  (k,l)\  I1/ 1 
INI<1  fc=i  J=m+i  ll*ll<i  fc=m+i  (=1 

oo        oo 

<  2    \^    \^  \a  (k,  l)\  — >  0  as  7n  — >  oo. 

(=m+l  fc=l 

Thus  H^-1  -  Q*^1 1|  ->  0  as  m  ->  oo  ■ 

Proof  of  Theorem  4.4  For  all  m  fixed  it  follows  from  standard  results  that 

But  since  for  any  sequence  {xm}  such  that  xm  >  0  for  all  m  it  follows  that  lim  infm  xm  >  0 
the  above  inequality  also  holds  in  the  limit.  Since  both  p(P)  G  I1  and  a(A)  G  i1  it  follows 
from  a  bounded  convergence  argument  that  limm  P^Am  exists  and  is  finite.  If  a  G  A"  then 
the  inverse  exists  as  well.  The  same  arguments  can  be  used  to  show  that  limm  j4'mf2mJ4m 
exists  and  is  finite. 

Finally  define  [il_1]m  =  [wi,j]i,j<m  =  (^m  —  S^S^  S'Ji)-1  where  Uij  are  the  elements 
of  the  infinite  dimensional  inverse  Q~l .  Let  w™  be  the  elements  of  the  approximating  finite 
dimensional  inverse  f^if  i,j  <  m  and  0  otherwise.  Since  uj™  — >  Wjj  for  all  i,jasm-»oo 

and  sup  |k>ij|  <  oo  it  follows  that  for  e  >  0  uj™  >  \u)ij\  +  e  only  for  a  finite  number  of  m 
for  ail  i,j.  Let  Z?^  =  supm  df3-   .  Then  BtJ  is  finite  for  all  i,j.  Now 

oo      oo 

lim  P'AlPm  =  lim  Y"  V  btb'pQ 

The  function  fm(i,j)  =  bib-ui^j  is  integrable  for  all  m  and  dominated  by  6;6-  Bltj 
which  is  also  integrable  on  the  counting  measure.  Therefore  by  dominated  convergence 
limm  P'mQ,^Prn  =  YaL\  Z)j=i  WjUiij  as  had  to  be  shown.  ■ 

Proof  of  Theorem  5.1  We  first  show  that  a(A)  G  A  for  A  =  P'Q.~l  .  From  Assumption 
(Al)  it  follows  that  Q  maps  ll  into  ll.  To  see  this  write  Q  —  E  +  a4I  where  the  matrix  E 
consists  of  elements  a(k,l).  For  if!1  we  have  Qx  =  Ex  +  <74:r  with  Ex  €  Z1  because  of  the 
summability  restrictions  on  a(k,l).  From  Lemma  4.2  we  know  that  for  x  G  I1  C  /2  we  have 
fr1^  G  ;2.  Assume  n_1x  g  Z1.  Then  a;  =  ftfr1^  =  EQ_1x  +  o-4f2-1a:.  But  Efr1^  €  i1. 
Thus  ||cr4Sl— 1a:||1  =  ||x  —  Ef2-1:c||  <  1 1 .-r 1 1 1  +  IJSQ- 1rcj|1  and  \\x\1-l  becomes  unbounded 
because  of  ff4fi_1x  L  .  But  this  contradicts  the  assumption  that  x  G  I1.  It  follows  that 
the  image  of  ll  under  fi"1  is  also  in  I1  which  in  turn  implies  that  ]>Zfc=i  \^ik\  <  °°  f°r  an  '■ 
This  can  be  seen  by  considering  the  image  under  f2_1  of  the  Z-th  unit  vector.  Since  P  E  A 
it  now  follows  that  P'fl~l  G  A. 

Next  we  show  that  a(A)  G  A'.  Recall  that  the  optimal  instrument  is  defined  by 
A'  =  P'17_1  or  A'VL  =  P' .  Interpret  P  as  a  set  of  d  vectors  in  I2.  The  row  rank  of 
A   is  therefore  the  same  as  the  column  rank  of  P.  Remember  that  P  =  [61,62,...]    with 
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bk  =  (27T)-1  j^d\nC{(30,e-^)/d(3e^kd\.  For  C(p0,e~^)  =  0o(A)/</>o(A)  we  have 

dlnC(0o,e-*)/dP=[^     -    gg    ^    .-.    gg     ' 

Define  the  expansions  of  ^q1(2)  =  Yl^Lo'tP<p,jz''  and  #o  H2)  =  YlJLo^ej^ ■  The  coefficients 
in  the  expansion  satisfy  the  difference  equation  ^j  —  ^o.i^Vj-i-  •••- (t)o,p',P<f>,j-p  =  0  which 
has  p  linearly  independent  solutions.  A  similar  expression  for  4>gj.  Set  t/^  ^  =  ipe  ■  =  0  for 
j  <  0.  Then 

&fc  =  [  Vv,fc-i    •'•    Vv,fc-P    V'e.jfc-i    ■••    4>e,k-q  ]'• 

Any  set  of  d  =  p  +  g  vectors  &fcn&fc2>  ■■■>t>kd  is  linearly  independent  because  of  the  linear 
independence  of  the  solutions  to  4>0(L)x  =  0  and  6q{L)x  =  0  together  with  the  requirement 
that  4>  (L)  and  #  (L)  have  no  common  zeros  and  that  4>p  ^  0  and  6q  /  0.  Thus  P  has  full 
column  rank.  But  this  means  that  A  has  full  column  rank  as  well,  thus  establishing 
that  Ad  =  [ai,...,ad\  is  nonsingular.  For  the  case  where  q  =  0  we  note  that  P  is  a 
matrix  of  p  linearly  independent  vectors.  The  space  P1-  is  spanned  by  vectors  of  the 
form  Vj  =  [0,  ---,£jT  ,  — 1,0, ...]'  where  £fc  is  a  root  of  (p0(L).  This  can  be  seen  from  x  e 
P1-  <=>  Y^jLo^^j-^j  =  °  for  a11  l  =  °A,--P  ~  1-  Thus  for  a11  *  Jt  has  t0  nold  that 
<Pq\L)L1x  =  J]jfc(l  -  HlL)~lLlx  =  0  «*•  (1  -  ^lL)-lLlx  =  0  for  at  least  one  f"1.  Now 
assume  that  3x  ^  0  with  cf>Q(L)x  =  0  and  fi_1x  €  P-1.  It  follows  that  x  £  I1  by  stationarity 
and  Q_1x  G  I2  by  invertibility  of  fi--1.  Since  Q~xx  €  Px  there  must  exist  constants  Cj  such 
that  Q~lx  =  ^2jCjVj.  Recursive  solution  for  Cj  shows  that  Cj  — >  oo  as  j  — >  oo.  Thus 
53  •  CjUj  ^  /2  which  contradicts  Q_1x  =  53,  9jvj-  Thus  for  any  x  ^  0,  </>0(L)x  =  0  it  follows 
that  A'x  ^  0. 

If  0  <  q  <  p  then  the  matrix  P'  contains  q  rows  which  are  determined  by  6q  (L).  As 
argued  before  P'  has  full  row  rank.  By  the  previous  argument  it  follows  that  for  at  least 
one  row  pi  of  P'  and  any  x  ^  0  such  that  cf>0(L)x  =  0  it  follows  that  pjfi-12  ^  0.  Thus  for 
all  x^0,  (f>0{L)x  =  0  we  have  P'9rlx  ^  0. 

We  also  need  to  establish  A'l  ^  0  where  i  is  an  infinite  dimensional  vector  of  ones. 
The  sum  of  the  Fourier  coefficients  P'l  is  proportional  to  (f>0(l)~l  and  6>o(l)_1  such  that 
P'l  7^  0.  Since  A  =  P'fi-1  is  contained  in  the  linear  span  of  the  vectors  of  P  and  P  is  not 
orthogonal  to  l  it  follows  that  A  is  also  not  orthogonal  to  l. 

Finally  we  show  that  a{A)  6  A".  First,  ff)(0,  X)la(-X)'dX  =  P'Q~lP  and  P,n~1P  = 
E(e2ztz't).  Now,  det  P'Q^P  =  0  *>  3^  G  Rd,  £  7^  0  such  that  PE^ztz'A  =  0  <=> 
eE{e1ztz't)l  =  0.  Then  for  xt  :=  £'zt,  0  <  £(e2x2)  =  Ex2E  [e2t  \Ft-\ )  =  0  ^  E  [e2  \Tt-\ }  = 
0  a.s.  or  x2  =  0  a.s.  Now,  clearly  E1  [e2  |^t-i]  =  0  a.s.  is  ruled  out  by  Assumption  (^41). 
Then  x2  =  0  a.s.  implies  xt  =  £'zt  =  0  a.s.  But  we  have  shown  before  that  the  column 
rank  of  A  is  full  so  that  £'zt  =  0  a.s.  is  impossible.  ■ 

Proof  of  Theorem  5.2  We  need  to  show  that  ^Zfcli  1^1   \ak\j    f°r  3  —  1j  —id  IS  bounded. 
Since  P  e  Awe  can  write  p'  =  p'Vl~1VL  =  P  'Sl-1^4/ '  +  £).  Define  the  vector  4  =  fcefc 
where  e^  is  the  fc-th  unit  vector.  Then 

p'4  =  p'n-1(^4/  +  £)4-  (B.7) 
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Now,  the  sequence  <  P Ik  \         €  A  and  T,£k  €  ll  for  all  k.  Therefore,  by  the  fact  that 

a(P'f2_1)  6  A  and  the  summability  assumption  of  Lemma  (5.2),  <  P  n_1E^fc  >         e  A 
From  (B.7)  we  have 


p'n-^kJ 


p' tk  -  p' arlY£k 


< 


p. 


p'n-^ik 


where  |.|  is  a  vector  norm  on  M.d.  Without  loss  of  generality  we  use  |x|  =  sup^Xjl  for 


x  €  Md-  Summing  over  k  gives  <?4YlkLi 


P  Qrllu 


<zr=i 


ph 


iP'fT^J  <  oo. 


Note  that 


P  Q.   l(.k    =  k  \Y^iZi  bitoik\  .  This  establishes  the  result  ■ 
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